training data bias
Query use case
Is there evidence that training data on which this system has (or is free of) bias?
Schemas used
Pseudo code
FUNCTION ai_system_has_data_bias(AI_System_ID)
    CREATE empty list Attestations
    // Step 1: Retrieve dataset verification credentials linked to the AI system
    SET Data_VC_IDs = get dataset verification credentials associated with AI_System_ID
    // Step 2: Identify bias attestations in each dataset verification credential
    FOR EACH Data_VC_ID in Data_VC_IDs DO
        SET Attestations_List = get bias attestations linked to Data_VC_ID
        FOR EACH Attestation in Attestations_List DO
            IF Attestation is of type "bias" THEN
                SET Component_Hash = Attestation's component hash
                SET Bias_Details = Attestation's details
                
                ADD ({"component": Component_Hash, "data_vc_id": Data_VC_ID}, Bias_Details) TO Attestations
    // Step 3: Return all bias attestations found
    RETURN Attestations
END FUNCTION
Explanation
- 
Find relevant data sources: - Retrieve the configuration verification credential (ConfigVcId) for the AI system.
- Extract the weights verification credential (WeightsVcId) used in training.
- Ensure that the WeightsVcIdis classified as"Weights".
- Trace back to the training system that produced these weights.
- Identify the datapack used in the training process.
 
- Retrieve the configuration verification credential (
- 
Extract the list of Data Verification Credentials ( DataVcIds) used in training from the datapack.
- 
Identify attestations that indicate bias: - For each DataVcId, retrieve its bias attestations.
- If an attestation is labeled as "bias", extract itscomponent_hashandBiasDetails.
 
- For each 
- 
Return a list of bias attestations: - Each entry consists of a tuple:
- Component information (component hashandDataVcId).
- Bias details describing the detected bias.
 
- Component information (
 
- Each entry consists of a tuple:
Query
- ai_system_has_data_bias(AiSystemId, Attestations)link to query
- link to simulator
Notes
This assumes we have a trusted method to identify bias on a data set.